Submitted to the Bernoulli 
arXiv: arXiv : 1301 . 0676 



Strong Consistency of Factorial K-means 
Clustering 

YOSHIKAZU TERADA 

Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, 
Toyonaka, Osaka, Japan 

E-mail: teradaOsigmath. es . osaka-u. ac . jp 

Factorial fc-means (FKM) clustering is a method for clustering objects in a low-dimensional 
subspace. The advantage of this method is that the partition of objects and the low-dimensional 
subspace reflecting the cluster structure are obtained, simultaneously. Conditions that ensure 
the almost sure convergence of the estimator of FKM clustering as the sample size increases un- 
boundedly are derived. The result is proved for a more general model including FKM clustering. 
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1. Introduction 

If we apply a cluster analysis to data, it is highly unlikely that all variables relate to 
the same cluster structure. Hence, it is sometimes beneficial to regard the true cluster 
structure of interest as lying in a low-dimensional subspace of the data. In these cases, 
researchers often apply the following two-step procedure: 

Step 1. Carry out principal component analysis (PCA) and obtain the first few compo- 
nents. 

Step 2. Perform the usual fc-means clustering for the principal scores on the first few 

principal components, which arc obtained in Step 1. 
This procedure is called "tandem clustering" by Arabic and Hubert (1999). Several au- 
thors warn against the use of tandem clustering (e.g., Arabic and Hubert (1999); Chang 
(1994); De Soete and Carroll (1994)). The first few principle components of PCA do not 
necessarily reflect the cluster structure in data. Thus, an appropriate clustering result 
might not be obtained using this procedure. 

Instead of a two-step procedure, such as tandem clustering, some methods that per- 
form cluster analysis and dimension reduction simultaneously have been proposed (e.g., 
De Socte and Carroll (1994); Vichi and Kiers (2001)). De Soete and Carroll (1994) pro- 
posed reduced fc-mcans (RKM) clustering, which includes conventional fc-means cluster- 
ing as a special case. For given data points Xi, . . . , a;„ in E^, the fixed cluster number 
k and the dimension number of subspace q [q < minjfc — 1, p\), the objective function 
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of RKM clustering is defined by 

n 

RKMniF, A) := - V min Ha;, - Afjf, 
n ^ i<j<k 

i—l 

where fj£M.,F — {fi, . . . , fk}cW,Aisapxq column- wise orthonormal matrix, and 
II • II represents the usual norm. Under certain regularity conditions, RKM clustering has 
strong consistency (Terada (20f2)). However, when the data matrix X = {xij)^^^ has a 
full rank, i.e., rank(X) = p, RKM clustering may fail to find a subspace that reflects the 
cluster structure. Indeed, RKM clustering has been applied to data composed of a total 
of 12 independent variables (Figure 1), which consists of 2 variables actually related to 
the cluster structure and 10 noise variables. The result of RKM clustering for the data 
shown in Figure 1 is given in Figure 2. The results indicate that the low-dimensional 
subspace revealed does not reflect the actual cluster structure and that the clustering 
result is, in fact, incorrect. 

Vichi and Kiers (2001) pointed out the possibility of such problems with the RKM 
clustering method and proposed a new clustering method, called factorial fc-means (FKM) 
clustering. For the given data points xi, . . . , a;„ in W, the number of clusters k, and 
the number of dimensions of subspace q, FKM clustering is defined by the minimization 
of the following loss function: 

1 " 

FKMniF, A I fc, q) - V min \\A^x^ - fjf, 
n ^ — ^ i<j<k 
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Figure 2. Plot of the result of RKM clustering for the artificial data given in Figure 1, where the black 
points represent misclassified objects. 



where F :^ {fi, . . . , fk}, fj(zW and A is a p x q column- wise orthonormal matrix. 
When the given data points independently drawn from a population 

distribution P, we can rewrite the FKM objective function as 

FKM{F, A, P„) := [ mm\\A^x - f\\^Pr,idx), 

where P„ is the empirical measure of the data points xi, . . . , a;„ in R^. For each set of 
cluster centers F and each p x q orthonormal matrix A, we obtain 

lim FKM(F, A, P„) ^ FKM{F, A, P) := / xiYm\\A^ x - f\\P{dx) a.s. 

ri->oo J feF 

by the strong law of large numbers (SLLN). Thus, besides RKM clustering, the global 
minimizcr of FKM{-, •, P„) is also expected to converge almost surely to the global 
ones of FKM{-, •, P), say the population global minimizers. In this paper, we derive 
sufficient conditions for the existence of population global minimizers and then prove the 
strong consistency of FKM clustering under some regular conditions. As a framework 
for the proof of the strong consistency of RKM clustering, we use the one proposed by 
Terada (2012). 

The rest of the paper is organized as follows. In Section 2, we describe the clustering 
algorithm of FKM and the relationship between RKM clustering and FKM clustering. 
We introduce prerequisites and notation in Section 3. In Section 4, we prove the uniform 
SLLN and the continuity of the objective function of FKM clustering. The sufficient 
condition for the existence of the population global minimizers and the strong consistency 
theorem of FKM clustering are stated in Section 5. In Section 6, we provide the main 
proof of the theorem. 
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2. Factorial K-means clustering 

We will denote the number of objeets and that of variables by n and p. Let X ~ {xij)nxp 
be a data matrix and Xi {i — 1, . . . , n) he row vectors of X . For given number of 
cluster k and given number of dimensions of subspace the objective function of FKM 
clustering is defined by 

n 

FKM„{A, F,U\k, q) := \\XA - UF\\l, = V min \\A^x, - fj\\\ 

^ — ^ l<i<k 

1=1 

where || • ||_f denotes the Frobenius norm, U = {uij)nxk is a binary membership matrix, 
A is a, p X q column-wise orthonormal loading matrix, F = {fij)kxq is a centroid matrix, 
and fj (j = 1, . . . , k) are row vectors of F representing the jth cluster center. FKMn 
can be minimized by the following alternating least-squares algorithm: 

Step 0. First, initial values are chosen for A, F, and U. 

Step 1. For each i = 1, . . . , n and each j = 1, . . . , fc, we update Uij by 

^ _ fl iff \\A^x, - < WA-^x, - for each / ^ j, 
1 otherwise. 

Step 2. ^ is updated by the first q eig envectors of X'^ [U{U'^U)-^U^ - /„] X, where 

/„ is the ?i-dimensional identity matrix. 
Step 3. F is updated using {U^U)~^U^XA. 

Step 4. Finally, the value of the function FKMn for the present values of A, F, and 
U is computed. If the function value has decreased, the values of A, F, and U are 
updated in accordance with Steps 1-3. Otherwise, the algorithm has converged. 

Let A, F, and U denote the optimal parameters of FKM clustering. We can visualize 
the low-dimensional subspace that reflects the cluster structure by XA. Figure 3 repre- 
sents such a visualization of the optimal subspace that results from FKM clustering for 
the artificial data given in Figure 1. 

Next, we briefiy discuss the relationship between the RKM clustering and FKM clus- 
tering. The objective function of RKM clustering is defined by 

n 

RKMn{A, F, U) ■.= \\X- UFA'^Wl = V min \\x, - Af^f. 

This objective function can be decomposed into two terms: 

RKM„{A, F, U) = \\X - XAA^Wf + \\XA - UFfp. (2.1) 

The first term of equation (2.1) is the objective function of the PCA procedure, and the 
second term is that of FKM clustering. Thus, FKM clustering reveals the low-dimensional 
subspace reflecting the cluster structure more clearly than the subspace of RKM clus- 
tering in some cases. For more details about the relationship between RKM and FKM 
clustering, see Timmerman et al. (2010). 
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Figure 3. Plot of the result of FKM clustering for the artificial data given in Figure 1. 



3. Preliminaries 

Let (ri, P) be a probability space, and Xi, . . . , Xn be i.i.d. p-dimensional random 
variables drawn from a distribution P. Let P„ denote the empirical measure based on 
Xi , . . . , Xn ■ The set of all p x g column- wise orthonormal matrices will be denoted 
by 0{p X q). Bqir) denotes the g-dimensional closed ball of radius r centered at the 
origin. We will define Tlk ■= {R C W \ < fc}, where is the cardinality 

of E. We will denote the parameter space by Sfe := TZk x 0{p x q). For each M > 0, 
niiM) {£; C M« I #{E) < k and E C Bq{M)} and 6^(1/) := 7^^(A/) x 0{p x q). 
Let V' : — >■ K denote a non-negative decreasing function. For each subset F dW^ and 
each A 0{p X q)^ the FKM clustering loss function with a probability measure Q on 
W is defined by 



Write 



and 



A, Q) := / xnm^WA^ X - S\\)Q{dx) 



rukiQ) := inf A, Q) 

{F, A)eEk 



mUQ I M) := inf A, Q). 

{F,A)e0l{M) 



For 6* = (F, A) G Sfc, we wiU use both descriptions *(6', Q) and A, Q). The 

set of population global optimizers and that of sample global optimizers will be denoted 
by 9' := {9 €Ek \ mfc(P) = ^{9, P)} and e'„ := {9 € Ek \ mk{P„) = ^{9, P„)}, 
respectively. For each M > 0, let 6* := {9 e e^(Af) | m*(P | M) = ^{9, P)} and 
e* := {9 e eiiM) I mliPn \ M) = ^{9, P„)}- When we emphasize that 9' and Q'^ are 
dependent on the index fc, we write &{k) and 9^fc) instead of 9' and 9^, respectively. 
One of the measurable estimators in 9^ will be denoted by 9n or 9n{k). Similarity, let 
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^* (or 0%{k)) denote one of the measurable estimators in 0* . Existence of measurable 
estimators is guaranteed by the measurable selection theorem; see Section 6.7 of Pfanzagl 
(1994) for a detailed explanation. 

Let dpi', ■) he the distance between two matrices based on the Frobenius norm and 
dni', •) be the Hausdorff distance, which is defined for finite subsets A, B cM.'' as 



dniA, B) := max < min \\a — b\ 

We will denote the product distance with dp and dn by d. As was done by Terada (2012), 
the distance between On and 9' is defined as 

d(0^„, e') := inf{d(^„, 6*) | 6* e 6'}. 

Like in Pollard (1981) and Terada (2012), we assume that -0 is continuous and "0(0) = 0. 
In addition, for controlling the growth of -0, we assume that there exists A > such that 
■0(2r) < \il>{r) for aU r > 0. Note that 



^{\\A' X - f\\)P{dx) < J ^PiWA' x\\ + \\f\\)P{dx) 
< [ ^P{\\x\\ + \\f\\)P{dx) 



< / ^.(2||/||)P(da;)+ / \\yji2\\x\\)P{dx) 

"'ll/ll>l|!«ll "'ll/ll<lla; 

<V(2|1/||) + A / ^Pi\\x\\)Pidx) 



for all f e F and all A e 0(j) x q). Thus, *(i^, A, P) is finite for each F G TZk and 
A e 0{p X q) as long as / ilji\\x\\)P{dx) < oo. 

Let R he a q X q orthonormal matrix, i.e., R^R = RR^ = Iq. For each f E M."^ and 
each A e 0{p X q), we have RA'^ € 0{p x q) and 

J ^iWA'^x- f\\)P{dx) = j ij{\\RA^x-Rn)P{dx). 

Hence, 9' is not a singleton when Q' ^ 0; that is, FKM clustering has rotational inde- 
terminacy, as well as RKM clustering. 



4. The uniform SLLN and the continuity of ^'(•, P) 



Lemma 4.1. Let M be an arbitrary positive number. Let Q be the class of all P- 
integrable functions onM^ of the form g(^pA-^(x) := min^^gi? 0(|jyl'^a; — /||), where {F, A) 
takes all values over 9^(M). Suppose that J tpdlxlDPldx) < oo. Then, 



lim sup 



gix)P„ {dx) - / gix)Pidx) 



~ a.s. 
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Proof. Dehardt (1971) provided a sufficient condition for the uniform SLLN. Thus, it is 
sufficient to prove that for ah e > 0, there exists a finite class of functions such that, 
for each g G G, there are g and g in with g < g < g and / g{x)P{dx) — J g{x)P{x) < e. 

Choose an arbitrary e > 0. Let Sp^qiy^) := {X € W''"^ \ \\X\\f = ^/q}. We wih 
denote by Ds^ the finite set on M'' satisfying the condition that, ,for all / G Bq{M), 
there exists g G Ds^ such that ||/ — g|| < <5i. Similarly, we will denote by Apxq. the 
finite set on Spxq{y/q) satisfying the condition that, for all A G Spxq{y/q), there exists 
B G Apxq, S2 such that \\A - B\\f < 62- Let TZk, {F G 7^t.(A/) | F C Ds,}. Take 
as the finite class of functions of the form 

miniPiWA'^x- f\\+di+52\\x\\) or min t(j{\\A'^x - f\\ - Si - 62\\x\\), 

where (i^*, A^) takes all values over TZk, Si x Apxq, S2 and tp{r) is defined as zero for all 
negative r < 0. 

For any F = {/i, . . . , /fe} G 7^*(M), there exists F' = {/i, . . . , /(.} G Tefe, s^ with 
< (^i for each i. In addition, since O(pxg) C Ua.ga^^^^ | \\A-A4f < S2}: 
for any A G ©(p x g), there exists A' G ^pxg, S2 with || A — < 52- Corresponding to 
each g(^F, A) ^ G, choose 

9{F,A)ix) mm4,{\\A^x- f\\+Si+S2\\x\\) 

and 

CUF, A){x) mill yjiWA^x - f\\ - 5^ - S2\\x\\). 

Since if) is & monotone function and 

\\A^,x - /jll - ^1 - 52\\x\\ < \\A^x - f,\\ < \\A^x -f^\\+di+ 62\\x\\ 

for each i and each a; G M^, we have g^p. A) < g{F, A) < 5(f. A)- 
Choosing i? > to be greater than (M + Si)/y/q, we obtain 

J [g{F, A)ix) - g(F, A){x)] P{dx) 

< j Y^[i>{\\Alx - + <5i + 52\\x\\) - i^iWA^^x -fl\\-5i- 52\\x\\)\P{dx) 

<k sup sup sup [■>P{\\A'^x - f-\\ + Si + 52\\x\\) 
ll^ll<fl/es,(A/) AeSpx,(v9) 

-i;{\\A'^x-f'^\\-Si-62\\x\\)]+2ky^ f ^j{\\x\\)P{dx), 

J\\x\\>R 

where m G N is chosen to satisfy the requirement that 2^/g + 62 < 2™. The second term 
in the last bound of the inequality directly above can be less than e/2 by choosing R 
to be sufficiently large. Note that tp is uniform continuous on a bounded set. The first 
term can be less than e/2 by choosing 5i, (^2 > to be sufficiently small. Therefore, the 
sufficient condition of the uniform SLLN for G is satisfied, and the proof is complete. □ 
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Lemma 4.2. Let M he an arbitrary positive number. Suppose that J ■i/;(||a;j|)P((ia;) < 
oo. Then, ^'(•, P) is continuous on 0^(M). 

Proof. This lemma can be proven in a similar manner as the proof of Lemma 4.1. If 
{F, A), (G, B) e ej(M) is chosen to satisfy dniF, G) < Si and \\A - B\\f < S2, then 
for each g & G there exists f{g) S F such that \\g— f{g)\\ < Si. Choosing R to be larger 
than M + Si, we obtain 



A, P) - *(G, B, P) 
f\ 



< 



< 



min ■0(11^4"^ a; 
feF 

max 
gee 

Y^miB^x- 



min V'dl-B x — gf|| 
geG 



P{dx) 



~f{g)\\)-^A\\B'^x-g\\)] P{dx) 

- g|| +Si+ S2\\x\\) - MB^x - gW)] P{dx) 



<k sup max[V'(|iB^a;-gi| +(5i+(52||a;|l)-V(||S'^a;-gj|)] 

||x<Hl| B(^G 



2kX" 



iji\\x\\)Pidx), 



(4.1) 



x||>_R 



where m G N is chosen to satisfy the condition that 2 + S2 < 2™. By choosing R 
to be sufficiently large and Si, (52 > to be sufficiently small, the last bound in the 
inequality (4.1) can be less than e. Since for each f € F there exists g{f) € G such that 
||g — g{f)\\ < Si, the other inequality needed for continuity is obtained by interchanging 
(F, A) and (G, B) in the inequality (4.1). □ 



5. Consistency theorem 

5.1. Existence of population global optimizers 

Our purpose is to prove that lim„_i.oo d{9n, 0') = a.s. under some regularity conditions. 
However, there is a possibility that 0' is empty. Therefore, first, we provide sufficient 
conditions for the existence of population global optimizers. 

Proposition 5.1. Suppose that J 4'{\\x\\)P{dx) < 00 and that mj{P) > mk{P) for j = 
1, 2, . . . , fc - 1. Then, 6' 7^ 0. Furthermore, there exists M > such that F C Bg{5M) 
for all {F, A)eQ'. 

Proof. See Appendix A. □ 

Under the assumption of Proposition 5.1, we can prove that ^'(•, P) ensures the 
identification condition, which is a requirement of the consistency theorem. 
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Corollary 5.1. Suppose that J ip{\\x\\)P{dx) < oo and that mj{P) > mk{P) for j = 
1, 2, . . . , fc — 1. Then, there exists Mq > such that for each M > Mq 

inf ^(6, P) > inf ^(6, P) for all e > 0. 

6iGej(Af) 9ee' 

where e*{M) := {9 e e^(A/) | d{e, 8') > e}. 

Proof. See Appendix A. □ 
5.2. Strong consistency of FKM clustering 

If the parameter space is restricted to Q1{M) C S^, we easily obtain the strong consis- 
tency of FKM clustering. Since 9^ (A/) is compact, we have 8* 7^ and the identification 
condition: 

inf "^1(9, P) > inf "^1(9, P) for all e > 

eee*(M) eee* 

where Q*{M) := {6 e 8^ (A/) | d{9, 8*) > e}. 

Proposition 5.2. Let M be an arbitrary positive number. Suppose that J ip{\\x\\)P{dx) < 

00. Then, 

lim d{e*^, 8*) = a.s., and lim ml{Pn \ M) = ml{P \ M) a.s. 

n— ^00 ?i— >-oo 

Proof. From Lemma 4.1 and Lemma 4.2, we already obtain the uniform SLLN and the 
continuity of ^'(•, P) on 8^(Af). Thus, the proof of this proposition is given by the 
similar argument of the proof of the consistency theorem. □ 

We cannot assume the uniqueness condition since FKM clustering has rotational inde- 
terminacy. In this study, as Terada (2012) did previously, we assume that mj{P) > mk{P) 
for J = 1, . . . , fc — 1. This condition implies that an optimal set F{k) of cluster cen- 
ters has fc distinct elements. The following theorem provides sufficient conditions for the 
strong consistency of FKM clustering. 

Theorem 5.1. Suppose that J ip{\\x\\)P{dx) < 00 and that mj{P) > mi^{P) for j = 

1, fc-1. Then, 8' 7^0, 

lim d{0n, 8') = a.s., and lim mk{Pn) = rrikiP) 
Proof. Sec Section 5. □ 

Note that if there exists a specific A such that 5* (A, F, P) = for all F; that is, the 
population distribution, P, is degenerate and the number of dimensions with the support 
of P is given as p — q, mj{P) > mk{P) for j = 1, . . . , fc — 1 is not satisfied. 
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Since the theorem deals with almost sure convergence, there might exist null subsets of 
f2 on which the strong consistency does not hold. Therefore, throughout the proof, ili 
denotes the set obtained by avoiding a possible null set from f2. 

First, we prove that there exists M > such that, for sufficiently large n, at least one 
center of the estimator F„ G TZk is contained in Bq{M). 

Lemma 6.1. Suppose that J ^Jj{\\x\\)P{dx) < oo. Then, there exists M > such that 

Cn oo \ 
U fl {c^ I V(F,„, A„) G e:„; F,n{io) n B,{M) ^ 0} = 1. 
n— 1 m—n J 

Proof. Choose an r > to satisfy the condition that P{Bp{r)) > 0. Let us take M to 
be sufficiently large to ensure that M > r and 



ij{M - r)P{Bp{r)) > J ii\\x\\)P{dx). (6.1) 

Note that mu{Pn) < "^{F, A, P„) for all F e TZk and all A G 0{p x q). Let Fq be the 
singleton that consists of only the origin. By the SLLN, we obtain 

^{Fo, A, P„) = J HU^^\\)Pn{dx)^ J ^p{\\A^x\\)P{dx) a.s. 

for all A G 0{p x q). Since ||A^a;|| < ||a;j|, we have 

ij{\\A'^x\\)P{dx) < [ i:{\\x\\)P{dx) 



for all AeO{pxq). 

Let n' := G f^i I Vn G N; 3m > n; Fm{uj) n Bq{M)}. For ah u G fl' , there exists a 
subsequence {n;},gN such that F„,(a;)nB,(M) = 0. Since \\A'^x-f\\ < ||/|l-||a;|| > M-r 
for all X G Bp{r), all / G Bg{M), and all A G 0{p x q), we have 



limsup«'(F„,, A„,, P„,) > limsup — V min tP{\\AlX, - f\\) 

ie{i\XieK } 

> limsup — Tp^M — r) 



i£{i\Xi<^K} 

> ^P{M - r)P{Bp{r)). 



From the assumptions made on the values of M, we have 

limsup 4'(P„,, A„,, Pn, )> J tp{\\x\\)P{dx), 
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which contradicts mfe(P„) < "^{F, A, P„) for all F e TZk and all A G 0{pxq). Therefore, 
we obtain P(ri') = 0; that is, 

Coo oo \ 
y fl {c^ I V(F„, Ara) e Q'm-, H Bg{M) ^ 0} U 1. 

n— 1 rn— n / 

□ 

By Lemma 6.1, without loss of generality, we can assume that each Fn contains at 
least one element of Bq{M) when n is sufBciently large. The next lemma indicates that 
there exists M > such that Bq{5M) contains all the estimators of centers when n is 
sufRciently large. 

According to the results of the previous subsection and similar arguments in the last 
part of the proof of the consistency theorem, the conclusions of the theorem will be 
proved when fc = 1. 

Lemma 6.2. Under the assumption of the theorem, there exists AI > such that 

(n oo \ 
y fl {c^ I V(F™, A,n) € e'^; F,n{u;) C B,(5M)} = 1. 
n— 1 m—n / 

Proof. Choose e > sufficiently small such that e + mk{P) < mk-i{P). Let us take 
M > to satisfy the inequality (6.1) and 

A / i:{\\x\\)P{dx) < €. (6.2) 

J\\x\\>2M 

Suppose that Fn contains at least one center outside Bq{5M). By Lemma 6.1, when 
n is sufficiently large, F„ must contain at least one center in Bq{M), say fi € Bq{M). 
Since {x \ \\A^x\\ > 2M} C {x \ \\x\\ > 2M}, we have 

^j{\\A^x- fi\\)Pn{dx)< f i^iWA^x- fi\\)Pn{dx) 

ATx\\>2M J\\x\\>2M 

< I H\\x\\+\\h\\)Pn{dx) 
J\\x\\>2A[ 



<\ ^j{\\x\\)Pn{dx) 
J\\x\\>2M 

for all A e 0{p x q). Let F^ denote the set obtained by deleting all centers lying outside 
Bq{^M) from F„. Since {F*, A) € e^_i(5M) for all Ac,0{px q), we have 

*(K, A Pn) > ml^Pn I 5A/) > mfe-i(P„) 

for all A e 0{p x q). For each x e Bp{2M) and each A e 0{p x q), we have 

WA'^x - /II > 11/11 - llccjl > 3M for all / ^ Bq{5M) 
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and 



Thus, we obtain 



\A^x - g\\ < \\x\\ + \\g\\ < 3M for all g e Bq{hM). 



x\\<2Mf<^F' 



min^j(\\A'^x- f\\)Pnidx) 



min ^iWA^x - f\\)P„idx) 

x\\<2M f^^S 



for all A G 0{p x q). 

Let n* := {w e 17i I Vn e N; 3m > n; 3(F™, A^) € F^{lo) (f_ Bg{5M)}. By the 
axiom of choice, for an arbitrary uj G fl*, there exists a subsequence {riijigN such that 
F,n{uj) (fi Bg{5M). By Proposition 5.2, we have 

lim mfc_i(P„ I 5M) = ml^^{P \ 5M) a.s. 

n— f oo 

For any (F, A) £ S^, we have 

mk-i{P) < ml^^{P I 5 A/) < liminf A„, P„) < limsup*(F;, , P„J 



< lim sup 



/ mm^{\\Alx-f\\)P.^{dx 

J\\x\\<2M f^P^ 

iPiWA^x - f,\\)P,,{dx) 

^{Fn, An, Pn) + A / 4i\\x\\)P„{dx) 

J\\x\\>2M 



x\\>2M 



< lim sup 

n 

<limsup*(F, A, P„) + A / yj{\\x\\)Pn{dx) 

n J\\x\\>2M 



(6.3) 



Choose {F, A) G Q' as (F, A) € in the last bound of the above inequality. By 
the assumption of M > and the SLLN, for a sufficiently large n, the last bound of 
the inequality (6.3) can be less than nik{P) + e, which is a contradiction. Therefore, we 
obtain 



oo oo 



□ 



Hereafter, M denotes a positive value satisfying inequalities (6.1) and (6.2). According 
to Lemma 6.2, for all (F„, An) S 6^, Fn € TZ*f.{5M) when n is sufficiently large. Since 
7?,J(5M) is compact, 0^.(5M) is also compact. 

By the uniform SLLN, the continuity of ^'(•, •, P) on 0^.(5Af) and Lemma 6.2, the 
conclusion of the theorem for the cluster number k can be proved in the same manner 
as was done for the last part of the proof of the consistency theorem in Terada (2012). 
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Choose 6^ e 6fc(5M) such that d{e^, 9') > 0. Write 

~ ^ie„ if^„ee*(5M) 
" \e, if^„^e*(5M) 



13 



It fohows that 



hm sup 

n 

Smce hmsup„ ?/'(6'o, Pn) = rrikiP) fo^' ^'o G 



*(0„, P„) - inf l-C^, P„) 



< a.s. 



hmsup inf *(6', P„) < hmsup*(6'o, P„) = mfc(P) a.s. 

Hence, we have 

> hmsup *(^„, Pn) - hmsup inf ^(0, P„) 

ri n Se6' 

> hmsup5'((?„, P„) — mk{P) a.s. 

Let e*(5M) {0 e B^(5M) | d{e, 8') > e}. By the uniform SLLN apphcd to e^(5Af), 
we obtain 

hminf inf *(6l, P„) > inf *(6', P) a.s. 

" eGej(5A/) eeej(5M) 

for ah e > 0. Fix an arbitrary e > 0. By CoroUary 5.1, 

hminf inf *(6», f„) > hmsup P„) a.s. 

n eee*(5M) „ 

Thus, for any w e fix there exists ng G N such that 

inf P„) > *(0~„, P„) 

6lGej(5M) 

for ah n > no- Conversely, suppose that d{0n, 6') > e for some n > uq. Then, we have 

inf ^{e, PO = Pn), 

which is a contradiction. Thus, we obtain 

hm d(^„, 8') = a.s. 

n— >oo 

By Lemma 7, we have 0„ = 6'„ for a sufhcicntly large n. Therefore, we obtain 

lim d(^„, 8') = a.s. 

n— >-oo 

Moreover, by the continuity of ^'(•, P) on 8^(5M), we obtain 

lim mfc(P„) = mfc(P) a.s. 
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7. Conclusion 

In this study, we proved the strong consistency of FKM clustering under i.i.d. samphng 
by using the framework of the proof for the consistency of RKM clustering used in 
Terada (2012). We also derived the sufficient condition for ensuring the existence of 
population global optimizers of FKM clustering. Moreover, we proved the uniform SLLN 
and continuity of the FKM objective function in the proof of the consistency theorem. 
Note that the compactness of parameter space is not a requirement for the sufficient 
condition of the strong consistency for FKM clustering, as well as RKM clustering. 

In the future, we will derive the rate of convergence of FKM clustering estimators and 
will propose the efficient criterion required to determine the number of clusters. 
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Appendix A: Existence of 0' 

Here we prove the existence of population global optimizers. 

Lemma A.l. Suppose that j '4>{\\x\\)P{dx) < oo. There exists M > such that 

inf ^{F', A, P) > inf *(6l, P) 

Aeo{pxq) eee*(M) 

for all F' e Tlk satisfying F' n Bq{M) = 0. 
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Proof. Conversely, suppose that, for all M > 0, there exists F' E TZk such that F' n 
Bq{M) = and 

inf ^(F\ A, P) < inf ^!(0, P). 
AGO(pxq) eee'(j\/) 

Choose r > to satisfy that the ball Bp{r) has a positive P measure; that is P{Bp{r)) > 0. 
Let M be sufficiently large such that M > r and that it satisfies inequality (6.1). Since 
WA'^x - /II > 11/11 - P^a;|| > M -r for aU / (jt Bq{M) and aU x e B.p{r), we have 

/ ih{\\x\\)P{x) > inf ^-(61, P) > inf ^-fi^', A, P) 

J eeeiiM) Aeoipxq) 

> inf / min ^jiWA^x - f\\)P{dx) 

AeOipxq) Ja,gBp(r) f^^' 

> (j){M -r)P{Bpir)). 

This is a contradiction. □ 

Lemma A. 2. Suppose that J ip{\\x\\)P{dx) < oo, and for j~ 2, 3, k—l,mj{P) > 
mk{P). There exists M > such that, for all F' G TZu satisfying F' (f_ Bq{5M), 

inf ^{F', A, P) > inf *(6', P). 

Aeo{pxq) eeeiibM) 

Proof. Choose M > to be sufficiently large to satisfy inequalities (6.1) and (6.2). 
Suppose that, for all M > 0, there exists F' E TZk satisfying F' ^ Bq{5M) and 

inf ^{F', A, P) < inf *(e', P). 

Aeo{pxq) eee*(5J\/) 

Let TZ'f. be the set of such F' and then 

TOfc(P) = inf ^{9, P). 

eeTZ'^xOipxq) 

According to Lemma A.l, each F' G TZ'^. includes at least one point on Bq{M), say /i. 
For all x satisfying ||a;|| < 2M and all A G 0{p x q), we obtain 

IIA'^cc- /II > 3M for ah / ^ B,(5A/) 

and 

\\A^x - g\\ < 3M for aU g e Bq{M). 

Thus, 

/ min^.(||A^a;-/||)P(da.)= / TnmJi\\A^ x - f\\)P{dxl 

J\\x\\<2M J\\x\\<2M 
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where the set F* is obtained by deleting all points outside Bq{5M) from F' . Since 

J^|xl|>2M^'(ll^^^- ^ ^ /||x||>2MVXI|a5||)/'(da^), we obtain that 

^{Fl,A,P) + xj i^{\\x\\)P{dx) 

J\\x\\>2M 

>f min ij(\\A^x - f\\)P{dx)+ [ ij{\\A^x - fi\\)P{dx) 

J\\x\\<2M f^^' J||a;||>2M 

> A, P) > m.k-i{P) 

for all A e 0(p X g). It follows that ■mk{P) + e < mk-i{P), which is a contradiction. □ 

Let us consider M > to be sufficiently large to satisfy inequalities (6.1) and (6.2). 
Write 9fc := 72.^(5M) x 0{p x q). Proposition 5.1 and Corollary 5.1 can be proved in the 
same way as Proposition 1 and Corollary 1 in Terada (2012). 

Proof of Proposition 5.1. According to Lemma A. 2, 

inf *(6', P) = inf ^(6, P). 

eeEk eeek 

Moreover, for any 6 e {TZk \ TZl{bM)) xO{px q), mk{P) < '^{0, P). Thus, we only have 
to prove 9' 7^ 0. 

Let C := {^'(6', P) \ 6 e Qk} and then TOfc(P) = inf C. By the definition of the 
infimum, for all x > mk{P), there exists c € C such that c < x. By the axiom of choice, 
we can obtain a sequence {c„}„gN such that c„ — >■ mk{P) as n — > oo. Using the axiom of 
choice again, we can obtain a sequence {^nlnsN such that 5'(0,i, P) — > mk{P) as n — >■ oo. 

By the compactness of 9^, there exists a convergent subsequence of {^nJrigN, say 
{0„.}igN- Let 9* G 9fc denote the limit of subsequence {0„.}igN, i.e., 9^, asi go. 

Since ^'(•, P) is continuous on 9fc, ^'(6'*, P) = mk{P)- Hence, we obtain 9' ^ 0. □ 

Proof of Corollary 5.1. Let 9^ := {0k & Ok \ ^{9k, P) = mkiP)}. Conversely, sup- 
pose that there exists e > such that infgge, ^(^j P) = infeee' ^(^i P)- By the defini- 
tion of the infimum, there exists a sequence {0„}„gN on 9^ such that ^'(^n, P) — > mk{P) 
as n — oo. By compactness of Qk, there exists a convergent subsequence of {^nlnsNi 
say {6'm;}igN- Let 6'* G 9fe denote the limit of subsequence {9mi}i&i- Since Orm — >■ 9* as 
i oo, we have d{9mi, 9^) < e for a sufficiently large i, which is a contradiction. □ 
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